# Chinese Multimodal Understanding
Chinese Clip Vit Large Patch14
Chinese CLIP model, based on VIT architecture, supports Chinese vision-language tasks
Image Classification
Transformers

C
OFA-Sys
2,333
32
Mengzi Oscar Base
Apache-2.0
A Chinese multimodal pretraining model built on the Oscar framework, initialized with Mengzi-Bert base version, trained on 3.7 million image-text pairs.
Image-to-Text
Transformers Chinese

M
Langboat
20
5
Featured Recommended AI Models